Identifying Thesis Statements in Student Essays: The Class Imbalance Challenge and Resolution

نویسندگان

  • Fattaneh Jabbari
  • Mohammad Hassan Falakmasir
  • Kevin D. Ashley
چکیده

A thesis statement or controlling idea is a key component of the Common Core State Standards of writing from grade 6 to grade 12. We developed a machine learning model to identify thesis statements in students’ essays in order to focus peer-reviewers on commenting on the presence and quality of an author’s thesis statement. Identifying thesis statements in essays can be considered as a classification task in which a classifier is trained to predict whether a sentence is a thesis statement or not based on the features extracted from the sentence. However, the number of sentences in the thesis class is usually much lower than those in the not thesis class. Our initial model could not deal adequately with the challenge of class imbalance; there were too few instances of thesis statements from which to learn. Our subsequent model employs synthetic over-sampling in order to address this challenge and improve performance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Discourse Element Identification in Student Essays based on Global and Local Cohesion

We present a method of using cohesion to improve discourse element identification for sentences in student essays. New features for each sentence are derived by considering its relations to global and local cohesion, which are created by means of cohesive resources and subtopic coverage. In our experiments, we obtain significant improvements on identifying all discourse elements, especially of ...

متن کامل

Identifying Thesis and Conclusion Statements in Student Essays to Scaffold Peer Review

Peer-reviewing is a recommended instructional technique to encourage good writing. Peer reviewers, however, may fail to identify key elements of an essay, such as thesis and conclusion statements, especially in high school writing. Our system identifies thesis and conclusion statements, or their absence, in students’ essays in order to scaffold reviewer reflection. We showed that computational ...

متن کامل

Modeling Thesis Clarity in Student Essays

Recently, researchers have begun exploring methods of scoring student essays with respect to particular dimensions of quality such as coherence, technical errors, and relevance to prompt, but there is relatively little work on modeling thesis clarity. We present a new annotated corpus and propose a learning-based approach to scoring essays along the thesis clarity dimension. Additionally, in or...

متن کامل

A Machine Learning Approach for Identification Thesis and Conclusion Statements in Student Essays

This study describes and evaluates two essay-based discourse analysis systems that identify thesis and conclusion statements from student essays written on six different essay topics. Essays used to train and evaluate the systems were annotated by two human judges, according to a discourse annotation protocol. Using a machine learning approach, a number of discourse-related features were automa...

متن کامل

Evaluating Multiple Aspects of Coherence in Student Essays

Criterion Online Essay Evaluation Service includes a capability that labels sentences in student writing with essay-based discourse elements (e.g., thesis statements). We describe a new system that enhances Criterion’s capability, by evaluating multiple aspects of coherence in essays. This system identifies features of sentences based on semantic similarity measures and discourse structure. A s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016